The main functions to interact with the $\chi$-distribution are chi2.rvs()
, chi2.pdf()
, chi2.cdf()
, chi2.ppf()
from the scipy.stats
package. The chi2.pdf()
function gives the density, the chi2.cdf()
function gives the distribution function, the chi2.ppf()
function gives the quantile function, which is the inverse of cdf - percentiles, and the chi2.rvs()
function generates random deviates.
We use the chi2.pdf(x, df, loc=0, scale=1)
to calculate the density for the integer values 4 to 8 of a $\chi^2$-curve with $df=7$.
# First, let's import all the needed libraries.
import numpy as np
import matplotlib.pyplot as plt
import scipy.stats as stats
x = np.arange(4, 8.01, 1)
stats.chi2.pdf(x, df=7)
array([0.11518073, 0.12204152, 0.11676522, 0.10411977, 0.08817914])
We use the chi2.cdf()
to calculate the area under the curve for the interval $[0,6]$ and the interval $[6, \infty)$ of a $\chi^2$-curve with $df=7$. Further, we ask Python, if the sum of the intervals $[0,6]$ and $[6, \infty)$ sums up to 1:
# interval $[0,6]
stats.chi2.cdf(6, df=7)
0.4602506496044429
# interval $[6,inf]
1 - stats.chi2.cdf(6, df=7)
0.539749350395557
(1 - stats.chi2.cdf(6, df=7)) + stats.chi2.cdf(6, df=7) == 1
True
We use the chi2.ppf()
to calculate the quantile for a given area (= probability) under the curve for a $\chi^2$-curve with $df=7$ that corresponds to $q = 0.25, 0.5, 0.75$ and $0.999$. We set 1 - chi2.ppf()
in order the get the area for the interval $[0, q]$.
stats.chi2.ppf(0.25, 7)
4.2548521835465145
stats.chi2.ppf(0.5, 7)
6.345811195521515
stats.chi2.ppf(0.75, 7)
9.037147547908143
stats.chi2.ppf(0.999, 7)
24.321886347856854
We use the chi2.rvs(df, loc=0, scale=1, size=1)
function to generate 100,000 random values (size
) from the $\chi^2$-distribution with $df=7$. Thereafter we plot a histogram and compare it to the probability density function of the $\chi^2$-distribution with $df=7$ (orange line).
rand_chi2_samples = stats.chi2.rvs(df=7, size=100000)
plt.figure(figsize=(10, 5))
plt.hist(
rand_chi2_samples,
density=True,
color="lightgrey",
edgecolor="darkgrey",
bins="scott",
)
plt.title("Histogram for $\\chi^2$-distributions with 7 degrees of freedom (df)")
plt.plot(
np.arange(0, 20, 0.1),
stats.chi2.pdf(np.arange(0, 20, 0.1), df=7),
"-",
linewidth=2,
color="orange",
)
plt.xlabel("samples")
plt.ylabel("Density")
plt.xlim(0, 16)
plt.show()
Citation
The E-Learning project SOGA-Py was developed at the Department of Earth Sciences by Annette Rudolph, Joachim Krois and Kai Hartmann. You can reach us via mail by soga[at]zedat.fu-berlin.de.
Please cite as follow: Rudolph, A., Krois, J., Hartmann, K. (2023): Statistics and Geodata Analysis using Python (SOGA-Py). Department of Earth Sciences, Freie Universitaet Berlin.